Kubernetes 1.30: The Improvements Operators Actually Appreciate

Contenedores industriales apilados representando workloads de Kubernetes

Kubernetes 1.30 landed in April 2024 and, unlike the August 1.31 that grabbed the headlines with GA AppArmor and sidecar reordering, slipped by quietly. That is a pity, because for anyone actually running clusters in production 1.30 is exactly the kind of release that matters: few epic changes, many tightened seams. This post walks through what is genuinely worth knowing and, more importantly, what changes day-to-day for an operator. It is not a changelog reread.

ValidatingAdmissionPolicy goes GA: the headline change

The promotion of ValidatingAdmissionPolicy to GA is, by a wide margin, the most operationally impactful improvement in this release. For years, any non-trivial admission rule —reject a Deployment with more than N replicas, enforce mandatory labels, block images without pinned tags— required an external webhook: an HTTPS service with its own certificate, its own HA story, added latency on every API call, and its own failure mode. If the webhook got slow or crashed, the control plane degraded with it.

With 1.30, the declarative CEL-based (Common Expression Language) model stops being experimental and becomes a serious option. You define the policy as just another cluster resource, the API server evaluates it in-process, and the whole webhook scaffolding disappears. For simple policies —which is the vast majority in practice— you replace several hundred lines of Go, a Deployment, a Service, and a cert-manager Certificate with a twenty-line YAML. Latency drops, the failure surface shrinks, and platform teams can review policies as ordinary pull requests without having to audit code.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: limit-replicas
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
      - apiGroups:   ["apps"]
        apiVersions: ["v1"]
        operations:  ["CREATE", "UPDATE"]
        resources:   ["deployments"]
  validations:
    - expression: "object.spec.replicas <= 5"
      message: "Deployments cannot exceed 5 replicas in this cluster."

The policy is paired with a ValidatingAdmissionPolicyBinding that decides which namespaces it applies to, which enables the classic “strict in prod, permissive in dev” pattern without duplicating logic. Webhooks are still needed when a rule must consult external state (an inventory database, a cost API, a licensing service), but everything else can and should migrate.

Pod scheduling readiness: controlling when a Pod enters the queue

The other genuinely useful improvement is pod scheduling readiness, now stable beta. The problem it solves is subtle but real: until now, a newly created Pod entered the scheduler queue immediately, even if its dependencies were not yet resolved —a PVC pending slow provisioning, an external quota waiting for validation, a permission that some controller had to grant first. The result was failed scheduling cycles, exponential backoff, and noisy logs.

With scheduling gates, a controller can create the Pod with a closed gate (spec.schedulingGates) and remove it only when the Pod is ready to compete for a node. That opens the door —quite literally— to real batch schedulers: Kueue, Volcano, and similar can now implement fair-share, gang scheduling, or dynamically prioritised queues without fighting the native scheduler. For an operator who does not run batch workloads, the direct impact is small; for anyone managing clusters with ML training, data pipelines, or rendering farms, it is transformative.

Job success policy and PVC retention: small but sharp

The new successPolicy on Jobs looks like a curiosity until you try running ML training with parallel indexed jobs and discover that the “all indices must complete” semantics does not match reality: often it is enough for N of M indices to succeed for the result to be valid. Declaring this on the Job itself is far cleaner than wrapping it in an external controller.

On the storage side, improvements to PersistentVolumeClaim retention policy close a long-standing gap: the lifecycle of PVCs auto-created by a StatefulSet can now be bound to the owning StatefulSet, preventing the orphan PVCs that accumulated after every kubectl delete sts. Small, but it removes a recurring cleanup chore.

What probably does not affect you

Some well-advertised improvements are, for a typical operator, noise. Node swap support (beta) remains territory best avoided outside very specific cases: the reasons we disabled swap in Kubernetes have not gone away. Structured logging phase 2 is positive but invisible unless you are tooling against control-plane logs. Contextual logging matters if you operate the control plane, not if you consume it.

Removals do deserve attention: the in-tree azureDisk driver is gone (you should already be on the CSI driver), the vSphere in-tree driver is deprecated, and SecurityContextDeny admission is fully retired. If a cluster is still on 1.28 or earlier, this is the moment to audit which in-tree plugins remain in use, before the migration becomes urgent.

Compatibility and upgrade

The upgrade path is the usual one: kubeadm upgrade plan followed by apply on the first control plane, then a rolling upgrade across the rest and node-by-node kubelet. containerd 2.x and CRI-O 1.30 are supported; etcd 3.5.x remains the recommended version. Ecosystem components (kube-prometheus-stack, cert-manager, Calico, Cilium, Flannel) were ready the day after release. Managed providers arrived with the usual lag: GKE first, EKS and AKS one or two months later.

Bottom line

Kubernetes 1.30 is the release that helps you keep the cluster lean. GA ValidatingAdmissionPolicy alone is enough reason to plan the upgrade: every webhook you can retire is one fewer component to maintain, monitor, and patch. Pod scheduling readiness is the foundation serious batch schedulers will build on, even if its immediate benefit depends on your workload. The rest are well-polished details that reduce operational friction quietly. For those on 1.29 the jump is calm; for those still on 1.28 or earlier, every release skipped turns the next jump into a bigger project. This is a good window to catch up.

Entradas relacionadas